Skip to content

Add properties in Evaluation Result - Custom Evaluator extra fields. #46077

Merged
w-javed merged 10 commits intomainfrom
waqasjaved02/aoai-properties-passthrough
Apr 3, 2026
Merged

Add properties in Evaluation Result - Custom Evaluator extra fields. #46077
w-javed merged 10 commits intomainfrom
waqasjaved02/aoai-properties-passthrough

Conversation

@w-javed
Copy link
Copy Markdown
Contributor

@w-javed w-javed commented Apr 2, 2026

Evaluation Result is OpenAI complaint. It contains score, label, reason etc.
We have a need to present more eval results to UI. So, introducing this property bag, in which we can add more details. Science team is also going to use this property bag, which can show more outputs per evaluator.

image image

@w-javed w-javed requested a review from a team as a code owner April 2, 2026 07:11
Copilot AI review requested due to automatic review settings April 2, 2026 07:11
@github-actions github-actions bot added the Evaluation Issues related to the client library for Azure AI Evaluation label Apr 2, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for passing through custom evaluator “extra fields” via a properties bag into the AOAI-style evaluation result objects produced by the evaluation results converter.

Changes:

  • Update _extract_metric_values to detect an outputs.<criteria>.properties dict and propagate it onto per-metric extracted values.
  • Update _create_result_object to include properties in the final AOAI result payload when present.
  • Add a unit test asserting properties is preserved and not flattened into the top-level result object.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py Propagates a per-criteria properties dict into per-metric result objects during AOAI conversion.
sdk/evaluation/azure-ai-evaluation/tests/unittests/test_evaluate.py Adds coverage validating properties passthrough behavior for a custom evaluator result row.

…esults

Pass through evaluator properties dict in AOAI evaluation results.
When an evaluator returns a properties dict, it is included alongside
score, label, reason, threshold, and passed in the result object.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@w-javed w-javed force-pushed the waqasjaved02/aoai-properties-passthrough branch from 2fd8ce2 to c8f5958 Compare April 2, 2026 07:19
Update _extract_metric_values and _create_result_object docstrings
to document the new properties field and its expected dict type.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
w-javed and others added 5 commits April 2, 2026 18:33
Address PR review: warn users when their custom evaluator returns
'properties' as a non-dict type so they can fix the output format.
Also add properties to _create_result_object example input.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@slister1001 slister1001 force-pushed the waqasjaved02/aoai-properties-passthrough branch from e9b6431 to 5ec9b95 Compare April 3, 2026 14:24
slister1001 and others added 2 commits April 3, 2026 10:25
Remove erroneous space in self._eval_metric. value (two occurrences) that
would cause an AttributeError at runtime when building result keys for
_details and _total_tokens fields.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Member

@nagkumar91 nagkumar91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — Properties passthrough

  1. Shared dict reference (bug-risk) — In _extract_metric_values, the same properties object is assigned to every metric entry:
for metric_dict in result_per_metric.values():
    metric_dict["properties"] = properties  # same object reference

If anything downstream mutates one entry's properties, all entries are affected. Consider metric_dict["properties"] = properties.copy() (or copy.deepcopy if nested dicts matter).

  1. No test for the warning path — The isinstance(metric_value, dict) guard logs a warning when properties isn't a dict, but no test covers this branch. A quick test passing properties="not_a_dict" would confirm the warning fires and properties is omitted.

  2. Typo fixes in _base_rai_svc_eval.py — Good catch on self._eval_metric. valueself._eval_metric.value. Cosmetic (Python allows whitespace after dot) but worth cleaning up.

- Use properties.copy() to avoid shared dict reference across metrics
- Add test for non-dict properties logging and omission
- Change properties type mismatch log level from warning to info

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Member

@nagkumar91 nagkumar91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All review items addressed — properties.copy(), non-dict test coverage, and log level adjustment. LGTM.

@w-javed w-javed merged commit cc3eaa7 into main Apr 3, 2026
21 checks passed
@w-javed w-javed deleted the waqasjaved02/aoai-properties-passthrough branch April 3, 2026 19:52
slister1001 added a commit that referenced this pull request Apr 3, 2026
…46077)

* feat(evaluation): support properties passthrough in AOAI evaluation results

Pass through evaluator properties dict in AOAI evaluation results.
When an evaluator returns a properties dict, it is included alongside
score, label, reason, threshold, and passed in the result object.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: update docstrings for properties passthrough per PR review

Update _extract_metric_values and _create_result_object docstrings
to document the new properties field and its expected dict type.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: log warning when properties is not a dict

Address PR review: warn users when their custom evaluator returns
'properties' as a non-dict type so they can fix the output format.
Also add properties to _create_result_object example input.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Release-1-16-4

* Fix stray space in _eval_metric.value attribute access

Remove erroneous space in self._eval_metric. value (two occurrences) that
would cause an AttributeError at runtime when building result keys for
_details and _total_tokens fields.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Remove empty changelog sections to fix Build Analyze check

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR feedback: copy properties dict and add non-dict test

- Use properties.copy() to avoid shared dict reference across metrics
- Add test for non-dict properties logging and omission
- Change properties type mismatch log level from warning to info

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Sydney Lister <sydneylister@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Evaluation Issues related to the client library for Azure AI Evaluation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants